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Efficiency Analysis: 



Enhancing the Statistical and Evaluative Power of the 



Regression-Discontinuity Design 



Abstract 



This study describes an analytic procedure that aims at improving the 
utility of quantitative program evaluation for decision-makers. The procedure 
has three main features: a) For statistical control, it adopts and extends 
the regression-discontinuity design. b) For statistical inferences, it 
de-emphasizes hypothesis testing in favor of interval estimation, c) It uses 
the limits of tne confidence interval to qualify the level at which a program 
operate , rather than making a simple statement about goal attainment. 
Following a step-by-step illustration of the quantitative procedure, we show, 
how each type of evaluation outcome thus obtained can be linked to a 
particular administrative objective and/or orientation, some specific planning 
procedures, and a set of corrective/supportive program activities. 
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Efficiency Analysis: 

Enhancing the Statistical and Evaluative Power 
of the Regression - Discontinuity Design 

Introduction 

In their review of the evaluation process, Stufflebeam, Foley, et a1. 
(1971) discussed five shortcomings that greatly limit the value of evaluation 
to decision-makers in their effort to improve an educational program. These 
five shortcomings are: a) the poor 'linkage between educational theory and 
evaluation practices; b) a lack of appropriate designs or even of instruments 
for the evaluative tasks; c) the shortage of personnel with a working knowl- 
edge of both evaluation techniques and the decision-making process; d) the 
narrowness ' of quantitative criteria which, too often, lead to the improper 
conclusion of no significant difference; e) the esoteric nature or poor 
quality of the information generated through the evaluation. 

Some of the shortcomings have, since then, been addressed. For instance, 
Tallmadge, Horst, and Wood (19/5) have adapted and publicized three 
quasi-experimental models to guide the assessment of project impact on student 
achievement. Strenio, Weisberg, and Bryk (1979) have offered a model of 
cognitive growth that can be applied in different evaluation contexts. (See 
also Keats, 1983.) Stufflebeam et al . (1971) have shown the relevance of the 
work of Braybrooke and Lindblom (1963) for making evaluation results congruent 
with the administrative decision-making process. But evaluators are still 
grappling with some of these issues: Should the quantitative approach to 
evaluation, with its limited focus on program outcomes, be replaced by an 
observational, ethnographic approach? If not, is the hypothesis testing 
paradigm, so valued in experimental research, appropriate for program eval- 
uation? How can quantitative information be accurately translated into terms 
that are understandable by educational managers? 
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This paper represents a modest attempt to deal with the last two 
questions. It described a strategy developed over the past three and a half 
years for ttje evaluation of a compensatory education program in an inner-city 
school district. The strategy, termed efficiency analysis, takes on the 
following features: a) For statistical control, it makes use of the 
regression-discontinuity design. b) For statistical inferences, it 
de-emphasizes hypothesis testing in favor of interval estimation. It uses the 
boundaries of the confidence interval to describe the level at which the 
program operates, rathe,' than making a simple statement about goal attainment, 
c) It translates the quantitative, description into an unequivocal decision 
alternative for the program administrators. 
Evaluation Design 

The regression-discontinuity (Campbell and Stanley, 1963) is a quasi- 
experimental design appropriate for situations where there is a known inter- 
action between treatment assignment and ability (achievement, aptitude, etc.). 
It has emerged in recent years as one of the most promising quantitative 
models for the evaluation of compensatory education. Based on the cri- 
terion of internal validity, the regression-discontinuity design has been 
shown to be superior to the norm-referenced model (Linn, 1981), since there 
often are multiple academic and contextual differences between the remedial 
group under study and the national sample from which test norms are developed. 
Based on the criterion of feasibility, the regre?~ion-discontinuity design has 
been found preferable to the classical experimental/control group approach, 
since it is impractical or unethical, in many instances, to withhold 
needed services fr^ students in order to set up a comparison group (Wolf, 
1981). Beyond the issue of applicability, the design may be most desir- 
able, 1) when assignment to the 'treatment' group is based on a definite 



cutoff score, 1. e., all students with a pretest score below a certain mark 
participate 1n the remedial program, while those above are dispensed of it; 2) 
when the educational environment includes multiple 'treatments,' and there is 
a need to separate the impact of the remedial, supplementary intervention from 
that of the general program of instruction. To determine the treatment's 
effectiveness, the task of the evaluator is to estimate what the performance 
level of the low achieving group would be without the remedial support* then, 
one test to see whether the actual score for that group is significantly 
different from the expected value. 

Two variants of this design exist. In the strict 
regression-discontinuity approach, separate pretest-posttest regression lines 
are obtained for the group above and the group below the cutoff point. The, 
two predicted values for that pretest cutoff score are calculated, by fitting 
it into each regression equation. A discontinuity in the regression lines, 
i.e., a difference between the predicted cutoff values , if significant, is 
taken as a measure of program impact. Tallmadge, Horst, and Wood (1975) 
propose a modification of the original technique that may be more sensitive to 
a possible pretest/program interaction among the low achieving students. In 
this version, known as regression-projection, the relationship between the 
pretest and the posttest is calculated only for the group of students above 
the cutoff score. Then, assuming linearity over the entire range of pretest 
scores, a single regression coefficient is used to estimate what the remedial 
group's posttest mean would have been under a 'no-treatment" condition. The 
formula for making such an estimate reads as: 

E (Y t ) = 7 C + b c (X t - X c ) 



[Insert Figure 1 here] 



It simply means that the difference between the high achieving and the 
low achieving group on the posttest is expected to be the same as it was on 
the pretest, except for the imperfect correlation between the two measures. 
Any discrepancy between the projected and the observed posttest mean is 
attributed to the remedial treatment. The two versions of the regression 
design are illustrated in Figure I. The details of the statistical test to 
establish significance of the differences can be found in Sween (1971) for the 
regression-discontinuity, and in Tallmadge and Horst (1974) for the 
regression-projection. 
Statistical Analysis 

The statistical tests offered to accompany the regression designs aim at 
'proving* a single point: that the program has or has not met its objective. 
As such, they follow the hypothesis testing paradigm, which is the one most 
commonly used in psychological and educational research. But, hypothesis 
testing is only one means of deriving statistical inference. As stated by 
Hays (1963), "in many circumstances," (and evaluation seems to be exactly one 
of these circumstances) "the primary purpose of data collection is not to 
test a hypothesis, but rather to obtain an estimate of some parameter" (p. 
375). A range of values may be more useful or more stable than a single, 
unqualified estimate, given the presence of sampling error affecting most 
research data. Rather than just ignoring the sampling error, an evaluator can 
place him/her self on safer ground by dealing straight forwardly with it, when 
drawing a conclusion about program effectiveness. To do that, one can turn to 
another form of statistical inference, the calculation of a confidence inter- 
val . 
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Ordinarily, 1n regression analyst, U 1s possible to establish confi- 
dence Intervals for three different _ : the regression coefficient 
itself, the actual score of an indlvidua. on the criterion measure, or the 
predicted value of a particular pretest score. Given the critical role 
accorded to the predicted mean value or. to the cutoff in the regression 
design, the calculation of the confidence interval is most necessary for each 
of these parameters. To obtain the boundaries of the confidence interval, one 
can use the following formula adapted from Hays (1963): 



where: Y' = Predicted posttest mean for the treatment group, or the 
predicted posttest value for the cutoff score 

X. = Mean of the treatment group on the pretest, or the cutoff score 
on the pretest 



est<Tyx = The standard error of estimate adjusted by the sample $ize 



For the t-value, any probability may be retained by the evaluator, depending 
on the desired level of confidence interval. For a 95% confidence interval, t_ 
is set at 1.96. 

Two kinds of information can be derived from such an analysis. One kind 
pertains to the tot-il change in students' classification or the proficiency 
rate of the program; the other concerns the relative amount of gain achieved 



by students in the program. 

A - S uccess or Proficiency Rate 

When the confidence interval is calculated for the predicted cutoff value 
on the posttest, its upper limit indicates the highest possible score that one 




X = Mean of the control group on the pretest 
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would a priori expect for a participant in the remedial program. By 
Inspecting the score distribution one can, then, determine the percentage of 
participants scoring above that mark. Those are students who have made so 
much progress that they are no longer in need of remediation. Their 
percentage Is likely to be small; but, it is a clear indication of a program's 
impact, and one that is readily understood by administrators. We call this 
percentage the proficiency rate yielded by the program. 
B - Efficiency Level 

When one turns the focus on the predicted mean value, one can obtain some 
additional and finer reference points to describe the program. If the actual 
posttest mean for the treatment group does not fall within the calculated 
interval, one can be 9,5 percent confident that 'something extraordinary' is 
happen, ng with the program. If the observed mean is above the upper limit of 
the confidence interval, the impact of the program is definitely positive. On 
the other hand, if the observed mean is below the lower limit of the confi- 
dence interval, the return on the program is clearly not what one would 
expect. As one can see, the procedure is quite unequivocal about the extreme 
cases. One may say that it also increases the likelihood of arriving at a 
nonsignificant difference. But even within the region of nonsignificance, it 
is possible to set up a gradient of performance, which allows the evaluator to 
draw inferences not just about goal attainment, but also the° level at which a 
program operates. Indeed, all the bits of information obtained from the 
standard statistical analysis can L>e condensed into one measure that we call 
the efficiency index. The term efficiency speaks of the average amount of 
progress made by the treatment group participants, relative to their own entry 
level and that of students in the control group. Mathematically, it is 
calculated according to the following formula: 



F = 



1 - 



Y" c ♦ Y t ♦ b (X t - * c ) 



1^ (estGyx ) 



'.5 



1 - 2 < Y 't - Y t> 



*.5 



where: R ■ the range of points over the .confidence Interval 
Y t - mean cn the posttest for the treatment group 
Y' predicted posttest mean for the treatment group 
If the observed and the predicted posttest means coincide, the efficiency 
index will take the value of .5. If the observed posttest mean corresponds 
exactly to the upper limit of the confidence interval, the efficiency index 
will take the value of +i. If the observed posttest mean falls precisely at 
the lower boundary of the confidence interval, the efficiency index will take 
the value of 0. 

Although the derivation of such an index may seem complex, its merit is 
that it tremendously simplifies the reporting of evaluation results to program 
administrators". That^advantage can be appreciated when one has to dec 1 with a 
program implemented at several grade levels. Whenever the efficiency index is 
greater than 1, the program is* prob+ibly exemplary; whenever the efficiency 
index is nega'ti ve^the program is probably in trouble. ^ Even when the index 
falls between 0 and 1, (in other words, no statistical significance is ob- 
tained), it is still possible to call attention to different degree^ of effi- 
ciency; in that sense, the procedure gets around the no-significant difference 
symptom that ^tuff lebeam et al . complained about. 

The whole p-ocedure is illustrated below with actual data obtained at 
four grade levels (2, 3, 7, and 3) for a remedial math program. 

In grade 7, for example, students with a pretest score lower than >38 NCEs 
(29th percentile rank) were assigned to the remedial program. The average 
pretest score for this low achieving group was 30.64 NCE, compared to a mean 
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of 57.49 for students not participating 1n the pro.gram. Based on the re- 
gression analysis., 1t was projected that the posttest performance for students 
, in the first group would be around 25.8 NCE, in the absence of the remedial 
program. 

■ * 

Y\ = 55.03 + .77 17.00 (30.64 - 57.49) = 25.78 

• -TOT 

A y5 percent confidence interval was calculated, that extends ± 6.90 NCE 
points around that central value. 



25.78 ± (4.96) (11. 03^ / 1 + (30.64 - 57. 49 2 = 25.78 ± 6.90 

59 59 x (12.01) 2 - 

The observed posttest mean for the treatment group'-w^s' 34.02', and fell outside 

of the confidence interval. It actually exceeded its upper limit by 1.34 NCE. 

/' - ' 

That difference can be translated into an efficiency index equal: 

m im 2 (25.78 - 34.02) * <5 = L]0 

13.8 

Clearly, the impact of the program is strongly positive at that grade level, 
for the average participating students. It is desirable to determine how many-,,., 
of them will no longer need remedial support. The regression at.. ,ysis led to 
a projected "core of 33.79, corresponding to the pretest cutoff of r 38 NCE. 
Y' " = 55.03 + .77 17.00 (38 - 57.49) = 33.79 

c -° TIToT 

A 95 percent confidence interval was also estimated for that value, and its 
upper limit turned out to be 39.17 NCE, (33.79 + 5.38). A study of the 
posttest score distribution revealed that 33 percent^ of the participants 
achieved above that level. Similar calculation can be carried out for each 
grade. , 

'[Insert Table 1 here] 
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Management Information 

Two questions need to be addressed now: 1) How does on* convey that kind 
of complex information to administrators in a handy way? 2) How does one 
advance the probability that the reported information indeed be included 1n 
the decision-making process? 
A - Making it Accessible 

The time-honored way of conveying a great deal of quantitative informa- 
tion in a handy and attractive way is through graphics. It is at this point 
that evaluation is no longer a science, but becomes an art. The evaluator 
must be resourceful,, and the graphic capabilities of microcomputers are now 
available to enhance that resourcefulness. One can use three types ofgraphs 
to summarize the information obtained through the regression-discontinuity 
design: a) Information on the success or proficiency rate of a program may be 
reported on a bar^gj/a^h, as illustrated in Figure 2. b) Information cn a 

[Insert Figure 2 here] 
program's efficiency may be reported in a modified scattergram as follows. 
The horizontal axis shows the pretest scores (say in NCE's) with a clear mark 
for the cutoff point; the vertical axis shows different values of the effi- 
ciency index. One can divide the area delineated by these axes into three 
subfields, by drawing two lines at point 1 and 0, perpendicular to the effic- 
iency axis. The top line, at point 1, corresponds of course to the upper 
limit of the confidence intervals calculated; it can be referred to as the 
optimal efficiency line. The bottpm line, at point 0, corresponds :o the 
lower limit of the confidence intervals calculated; it may be referred to as 
the minimal efficiency line. The subfield above the optimal efficiency line 
is designated as a net growth area; the subfield between the optimal and the 
minimal jfficiency lines is designated as a maintenance area; the subfield 

t 

,/ 

below the minimal efficiency line is designated as a breakdown area. The 
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points 1n the scatterplot represent the various sites or grade levels at which 
the program was Implemented. If at a particular grade level the actual 
posttest mean falls within the confidence interval, for the predicted mean, 
that observation will appear between the two efficiency lines; this wijl 
suggest that the remedial program is operating as a maintenance unit, whose 
utility is to prevent the deteri ration of sk Is, and thus sustain the 
operation of the regular instructional program; in other words, without it, 
the regular program of instruction may not be able to function with any kind 
of efficacy. If at another grade level the posttest mean exceeds the upp^r 
limit of the confidence interval, that observation will appear above the 
optimal efficiency line; this will suggest that the remedial program is 
operating as a producti unit, capable of creating a net growth in students' 
competence. If at stir another grade level the posttest mean fails to reach 
the lower limit of the confidence interval, that observation will appear below 
the minimal efficiency line; this will suggest that the remedial program is in 
disrepair. The whole procedure for reporting information on program 
efficiency is depicted in Figure 3. c) The two kinds of information on 

[Insert Figure 3 here] 
efficiency and success/proficiency rate can be integrated in one diagram, 
called a performance record. As shown in Figure 4, each grade level is 

[Insert Figure 4 here] 
represented at the center of the diagram. The measures of program performance 
are indicated numerically at the periphery, and graphically as grooves on the 
record. The inner marks stand for the degree of program efficiency, while the 
outer marks stand for proficiency. These three types of graphs can be at- 
tached to the Executive Summary for the evaluation report. 
B - Making it Practical 

In order to make the information he/she generates relevant to the 
decision-making process, J:he evaluator must have a good understanding of that 
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process. The understanding should be based on empirical evidence about the 
overall program environment, and should also be be guided by a theoretical 
framework. Previous research suggests that the process of rational 
decision-making follows four principles. What are those principles and what 
do they entail? 

1. A decision requires a clear information base. 

The Information b^se, which is of course nothing other than previous 
evaluation results, may indicate one of three things: a) a given program is 
capable of producing net academic growth, i. e., its efficiency index is 
greater than 1; b) a given program operates as a maintenance unit, i. e., its 
« efficiency index is between 0 and 1; c) a given program is experiencing a 
breakdown, i.e., its efficiency index is lower than 0. 

2. A decision is always inscribed within a general approach to manage- 
ment. 

Following Stufflebeam et al. (1971), we distinguish three possible 
approaches in an educational setting: a) a homeostatic approach, intended to 
sustain the achieved balance in a program; b) an incremental approach, aimed 
at "shifting the program to a new balance based upon small serial improve- 
ments" (p. 09); c) a neomobi 1 istic approach geared for a large and significant 
change necessitated by critical program conditions. 

3. A cecision calls for selection or design of specific procedures to be 
followed. 

This principle really speaks of the planning stage in the process, a) 
Planning may consist in simply standardizing or operationalizing the proce- 
dures presently in use. b) Another possibility is to target particular areas 
where the need is the greatest, or where resource allocation will be most 
efficient, c) Still another alternative is to reorganize a program in all its 
aspects, adjusting the objectives, providing new means, redefining personnel 
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roles, setting check points for accountability. 

4. A decision involves translating a set of selected procedures Into 



Three courses of action may be followed: a) one can continue or recycle 
a set of practices proven to be successful; b) one can offer training and 
other activities in staff development; c) one can move to enforce or implement 
available guidelines/procedures where numerous discrepancies have been found 
between a program's objectives and modus operandi. 

Stufflebeam et al . insist that the ultimate objective of a rational 
decision-making process, similar to the one outlined above, is educational 
improvement. While no educator would contest that view, it has been our 
experience that a number of immediate goals often supersede the ultimate 
objective. These immediate administrative goals fall into three catsgories: 
those aimed at producing change, those aimed at achieving control, those aimed 
at promoting or marketing a particular program or position for public re- 
lations purposes. These immediate goals, because of the rather quick payoffs 
associated with them, are the guiding lights of management. So, the eval- 
uation results must be" articulated to them in order to sensitize the 
decision-makers. Me propose a restructuring of the decision-making model to 



The mcdej establishes a correspondence between each immediate goal and 
the type of elements in the decision-making process which it seems most 
congruent with. It can b* of great utility to the evaluator in formulating 
his/her reconmendatlons for program development. Depending on the kind of 
evaluation results obtained (i. e., the value of the efficiency index), a 
particular administrative approach, some specific planning procedures, and a 
set of corrective/supportive activities may be suggested. Inat kind of 



activities in order to meet an objective. 




Figure 5 depicts this new structure. 



o 
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detailed, facultative work has a good probability of catching the attention 
of the decision-makers. 
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Table 1 

Statistical Data for Chapter I and Nonchapter I Students in Mathematics 





' ■ — Grade 


2 




3 




7 




8 


Cont. 




Parameters — 


Treat. 


Cont. 


Treat. 


Cont. 


Treat. 


Cont. 


Treat. 




1. 


Pretest Mean 


32.04 


64.80 


23.27 


60.00 


30.64 


57.49 


30.09 


56.83 




2. 


SD of Pretest 


11.26 


14.89 


9.91 


16.73 


8.31 


12.01 


9.36 


14.46 




3. 


Posttest Mean 


37.70 


58.94 


32.98 


59.13 


34.02 


55.03 


37.40 


56.02 




4. 


SD for Posttest 


17.27 


19.39 


10.95 


16.31 


11.88 


17.00 


8.15 


14.58 




5; 


Cutoff Score 


41.90 


- 


28.20 


- 


38.00 


- 


38.00 


- 




6. 


Pre-Post Correlation 




.57 


- 


.39 


- 


.77 


- 


.59 




7. 


Sample Size (N) 


70 


65 


64 


61 


58 


59 


66 


60 




8. 


Expected Post Mean 


34.75 




44.70 




25.78 




40.12 






9. 


Confidence Interval for (8) 


±9.48 




±10.06 




±6.90 




±6.53 






10. 


Expected Value for Cutoff 


44.29 




47.04 




33.79 




44.82 






11. 


Efficiency Index 


+ .65 




-.08 




+1.10 




+ .29 






12. 


Proficiency index 


17% 




2% 




33% 




5% 
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J 
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PRETEST SCORES 



Fig. 1. 



« oith treatment effect independent of pretest .tatue. 
WrlS» r ?!!S WlSft and HVst , 1976) 
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FIGURE 2 - PROFICIENCY DATA: PERCENTAGE OF STUDENTS 'GRADUATING OUT ' OF THE 
REMEDIAL PROGRAM AT EACH GRADE LEVEL SERVED 
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FIGURE 3 - PROGRAM EFFICIENCY LEVEL AT FOUR GRADES 
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Fig or* 4 - Math Rtcort 



(Each lint 1s worth 4 percentage points. Inner marks represent 
the degree of program efficiency. Outer aarks represent the 
degree of program effectiveness). 
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FIGURE 5 - STRUCTURE OF RATIONAL DECISION-MAKING 
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